The final judgment: classification

k-nearest neighbor

class precision recall
My 2019 0.3333333 0.24
My 2020 0.3555556 0.32
NL 2019 0.2000000 0.22
NL 2020 0.1250000 0.16

As can be seen in the confusion matrix above, the k-nearest neighbor classifier performs badly. However disappointing, this result is not surprising. Just as we have seen no big differences between the four playlists overall, it makes sense that nearest neighbors might not be from the ‘correct’ playlist.

Random forest

class precision recall
My 2019 0.2884615 0.30
My 2020 0.2941176 0.30
NL 2019 0.1428571 0.14
NL 2020 0.1875000 0.18
class precision recall
My 2019 0.2857143 0.28
My 2020 0.3636364 0.40
NL 2019 0.3333333 0.36
NL 2020 0.3095238 0.26

The random forest classifiers perform much better. However, they are still not satisfactory and their results might even be considered insignificant. The second random forest classifier does seem to perform somewhat better than the first. The second only considers the following features: tempo, A, and all timbre features except c01 and c04. The plot below shows that these selected features indeed are more important than others.

Just an average girl?

The corpus I am going to analyze consists of four playlists: my Top Tracks of 2019, my Top Tracks of 2020, the Top Tracks NL of 2019 and the Top Tracks NL of 2020. These playlists are respectively representative of the following groups I will be comparing: my taste in music in 2019, my taste in music in 2020, the average Dutch taste in music in 2019, and the average Dutch taste in music in 2020. In this portfolio, I want to find an answer to the following questions:

1. How did my taste in music in 2020 differ from 2019? (comparison between group 1 and 2)

2. How did the average Dutch taste in music in 2020 differ from 2019? (comparison between group 3 and 4)

3. How average was my taste in music in 2019? (comparison between group 1 and 3)

4. How average was my taste in music in 2020? (comparison between group 2 and 4)

I find it very interesting to analyze these comparisons, especially in light of the coronavirus pandemic. Due to the social distancing measures I did not listen to music in any social setting, such as hanging out with friends, clubbing, or even working out at the gym. My hypothesis is that my taste in music was therefore less average in 2020 than it was in 2019.

Evidently, my corpus is representative for the groups I want to compare, because it actually consists of those groups. However, I do have to remark that a Top Tracks NL playlist might not be representative of ‘the average’. It just contains those tracks that were listened to most often, possibly only within a certain demographic. Whether I belong to this demographic, I cannot say; there seems no information to be found about this anywhere on the internet.

My personal Top Tracks playlists also need a sidenote or two. First, I do not have a premium Spotify account, which means I get a limited amount of skips per hour and most playlists can only be played on shuffle. It might be that a song ended up a Top Track because it was in a playlist I listened to a lot, not because I liked that song so much. However, these limitations only apply when listening to Spotify on my phone. The desktop version of Spotify does allow for infinite skips and the freedom to choose songs manually, put songs in the waiting list, and play a playlist on shuffle or in order.

Moreover, Spotify might be biasing playlists by including certain –possibly sponsored– songs. Also, shuffle might not be completely random, playing popular or sponsored songs first. This way, I would be exposed to more popular songs, which might influence my Top Tracks, possibly making it more average.

A great example is “Stuck with U” by Ariana Grande and Justin Bieber. It is one of my Top Tracks of 2020. It used to be in a lot of different playlists I listened to at the time. And yes, I liked that song, but nevertheless, I am fairly sure there were other songs I liked more in 2020. I definitely consider this song atypical for the group ‘my taste in music in 2020’.

A song I consider very typical for my taste in music in 2019 is “Drive and Disconnect” by Nao. I remember listening to this song on repeat when I discovered it, but also for a longer time after that. And even a few months later I rediscovered this song, and fell in lover all over again. Even now, it is still one of my favorite songs.

While browsing the API Reference, I found the following variables that seem interesting to analyze: genres, artists, popularity, danceability, energy, valence, speechiness, instrumentalness, key, mode, tempo.

(No) time to relax


An energy-valence scatter plot

(No) time to dance


A danceability-tempo scatter plot

Observations

Valence versus Energy

  • Valence in Top Tracks NL 2019 is very nicely and equally distributed, unlike Top Tracks NL 2020. In fact, Top Tracks NL 2020 seems to have the inverted valence distribution of my Top Tracks playlists.
  • Energy: In both categories of 2019, there is a bump around 0.7. They also spread out downward quite equally. The NL Top Tracks of 2020 seems to be the odd one out (again); energy was quite lower on average and more spread out. The mode, however, was somewhat higher than the bumps of the other playlists.

Tempo versus Danceability

  • The tempo distribution in 2019 was pretty much the same in both categories, both peaking at 100 bpm. The peak tempo of My Top Tracks of 2020 has shifted upwards to 105 bpm. The NL Top Tracks of 2020 is totally different form the rest, with no real peaks, and more spread out to lower (90) and higher (120) bpm’s. The reference lines for the means clearly show that there was a decrease in tempo between the NL Top Tracks of 2019 and 2020.
  • Danceability: My taste in music has remained somewhat constant, with the bumps of My Top Tracks of 2019 and My Top Tracks of 2020 both at around 0.75, though the overall distribution is more spread out in 2020 and the mean has slightly decreased. Unlike the other plots, this plot shows that, with respect to danceability, playlists are more similar when grouping by year than they are when grouping by category (personal versus NL). Not only do the shapes look more similar, the mean dancibilities seem to follow the same trend as well; the mean dancibility was slightly lower in 2020 than in 2019. This could be explained by the fact that, due to social distancing measures, there were less occasions to dance. As a result, artists also made less music to dance to.

Mode

My Top Tracks playlists clearly count more minor modes (64%) than the NL Top Tracks playlists (49%). Also, there is a slight increase in minor tracks in 2020 as opposed to 2019.

Repetitive chord progressions and vocal ‘blocks’


Chromagrams per bar and per section

“Sweetie Odo” by Juls featuring Sway Clarke

This song was a typical Top Track of mine in 2020 (see song info). In the bars chromagram, you can clearly see the repetitive chord progression throughout the song – the skips from C to A to G. The sections chromagram clearly shows bright ‘blocks’ at D and C. The D-blocks represent the chorus vocals and the C-block represents the vocals in the bridge. The first few blocks at G, F and D represent the intro.

Song info

Under construction | Feature | Average | “Sweetie Odo” | |—————-|————————————|—————| | Key |character | G Minor | BPM |111.8439 | 100 | Time Signature |4 | 4/4

Different sounds in different sections


Cepstograms per bar and per section

“Sweetie Odo” by Juls featuring Sway Clarke

This song was a typical Top Track of mine in 2020 (see song info). Both cepstograms clearly show the intro and outro in c03. The verses are also brighter in c03. The chorus sections, however, light up in c05. I wonder what this means. The second chorus also lights up in c02, but the first chorus not so much.

Song info

Under construction

Self-similarity matrices


Self-similarity matrices for chroma and timbre

“Sweetie Odo” by Juls featuring Sway Clarke

The chroma matrix clearly shows the intro, verses (light strips), choruses (dark squares) and outro. The timbre matrix shows the outro even more clearly. Interestingly, the chroma of the outro resembles the verse’s chroma. The dark square at the bottom-left corner of the timbre matrix shows the percussion coming in. In general, the clear timbre changes can be ascribed to changes or short pauses in the percussion, even the slight changes just before 60 and 120 seconds represent a short percussion break of about a bar.

The key visualization


I think there is not much interesting to be said about the distribution of the keys in my corpus – maybe because my corpus is too small (200 tracks in total). When plotting different variables against each other, there was no plot that really stood out.

The plot on the left shows the key distribution of 2019 versus 2020, taking the two modalities into account. The following changes can be seen over time:

Achordingly, repetition is key


Both songs have a very tropical vibe. It’s interesting to see that, assuming the chordogram is correct, both songs seem to use the same few chords throughout the whole song. Whereas “Cash” seems to hold the same chord for a few bars, “Loop niet weg” alternates between chords and then repeats that pattern.

Covid shows no tempo trend


Unfortunately, my laptop cannot handle tempograms… Instead, I have made a density plot to compare the tempi in different playlists of my corpus.

The tempo distribution in 2019 was pretty much the same in both categories, both peaking at 100 bpm. The peak tempo of My Top Tracks of 2020 has shifted upwards to 105 bpm. The NL Top Tracks of 2020 is totally different form the rest, with no real peaks, and more spread out to lower (90) and higher (120) bpm’s. The reference lines for the means clearly show that there was a decrease in tempo between the NL Top Tracks of 2019 and 2020.